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Abstract 

We study a semi-random model for /c-SAT involving an Achlioptas-process version of 
the random fc-SAT process: a bounded number of &-CNF clauses are drawn uniformly at 
random at each step, and exactly one added to the growing formula according to a particular 
rule. We show the validity of the model by proving the existence of a rule that shifts 
the satisfiability threshold. This extends a well-studied area of probabilistic combinatorics 
(Achlioptas processes) to random CSP's. In particular, while a rule to delay the 2-SAT 
threshold was known previously, this is the first proof of a rule to shift the threshold of a 
CSP that is NP-hard. 

We then propose a gap decision problem based upon this semi-random model. The aim 
of the problem is to investigate the hardness of the random /c-SAT decision problem, as 
opposed to the problem of finding an assignment or certificate of unsatisfiability. While the 
usual decision problem is trivial in some respects because of the sharp threshold, in this 
semi-random model, with an adversary who can shift the threshold, the problem becomes 
relevant. Finally, we discuss connections to the study of Achlioptas random graphs. 



1 Introduction 



The mathematical study of phase transitions and threshold behavior in random structures began 
with Erdos and Renyi's first paper on random graphs [22J. In it, they showed that for any fixed 
e > 0, if the number of edges in the random graph is at most (1/2 — e)n then the largest 
connected component is of size O(logn) with probability 1 — o(l) (whp), and if the number of 
edges is at least (1/2 + e)n, then the largest component is of size O(n) whp. The existence 
of a giant, linear-sized component thus exhibits what is known as a sharp threshold, with its 
probability rising from near to near 1 with the addition of a sublinear number of additional 
edges. This is as opposed to the coarse threshold for the presence of a triangle in a random 
graph, a property whose probability is strictly bounded away from and 1 for any linear number 
of edges. Since that initial paper, the random graph phase transition has been studied in great 
detail; it is now known that the scaling window has width n 2//3 [13], the structure of both the 
giant component and the smaller components is well- understood, and many modifications of the 
original model have been studied. The Erdos-Renyi random graph still plays a central role in 
discrete probability, both as an object in its own right and as a tool to solve other problems. 

In theoretical computer science, the threshold phenomenon that has attracted the most study 
is the unsatisfiability threshold in the random A:-SAT model. An instance of random A:-SAT on n 
variables and m clauses consists of the conjunction of m clauses, each of which is the disjunction 
of k literals and chosen uniformly at random from the set of (^)2 fc possible clauses. Small 
variations in the description of the model, such as the difference between adding clauses with or 
without replacement are not significant with respect to the threshold behavior. 

In |27j Friedgut proved that the satisfiability threshold is sharp. In particular, he proved 
that there exists a sequence rfc(n) so that for every e > 0, 
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where <3?r-n is a uniformly random fc-SAT formula with rn clauses. It remains an open problem 
whether or not the sequence rfc(ra) has a limit; this is sometimes referred to as the 'satisfiability 
threshold conjecture'. Much work has been done on proving upper and lower bounds on rfc(ra); 
the current best upper and lower bounds for k = 3 are 4.508 [21] and 3.52 [31], [28] respectively. 
See [3] for best current bounds for other values of k and a survey of the problem. Typically 
upper bounds come from a variant of the first-moment method, while lower bounds come from 
either analyzing an algorithm [16] . |17| . [I] or the second-moment method [4]. 

A second central open question in this area is whether or not random fc-SAT formulae (k > 3) 
at or near the satisfiability threshold are computationally hard (2-SAT is solvable in polynomial 
time in the worst case). Selman, Mitchell and Levesque [37] gave experimental evidence that 
near the threshold it is in fact difficult to determine satisfiability. But in some sense, the decision 
version of the random /c-SAT problem is trivially easy. Above the threshold density we can say 
'unsatisfiable' and below the threshold say 'satisfiable' and be correct with high probability 
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without even looking at the sampled instance. Because of this, the study of the hardness of 
random A;-SAT has turned towards finding certificates of unsatisfiability for clause densities 
above but as close as possible to the threshold [23], [25], [lU] . 

In this work we consider a variant of the random fc-SAT problem for which the decision 
problem is still relevant. Our initial motivation came from the work on Achlioptas processes in 
the study of random graphs. Dimitris Achlioptas initiated this study by asking whether, given 
the choice of two random edges at each step of a random graph process, one could delay the 
phase transition by a constant factor. Formally an Achlioptas random graph process is defined 
as follows: 

• Begin at step with an empty graph on n vertices. 

• At step i, two uniformly random edges are presented, and exactly one of the two is selected 
according to a given rule and added to the graph 

• The choice of edge can depend on the edges presented, the current graph, and the history 
of the process, but not on the edges to be presented in subsequent steps. 

His original question was whether there is a rule for choosing one of the two edges so that at 
step (1/2 + e)n the graph contains no linear-sized connected component whp. His question was 
motivated by the 'power of two choices' in load balancing [6], [35] and was answered affirmatively 
by Bohman and Frieze [9] in the first of many papers study Achlioptas processes. While the 
phase transition has received the most attention ( [12] , |26] , [11] ) , rules for shifting the threshold 
of other properties have also been found (e.g. Hamiltonian cycles [33] or small subgraphs [32], 
[36]). One primary aim of the study of Achlioptas processes is to understand which qualitative 
properties of the phase transition are robust under small modifications of the model. In [29] . 
[30], [7] it is shown that certain critical exponents are universal for a large class of Achlioptas 
rules. 

Sinclair and Vilenchik |38] first considered the Achlioptas process model with regard to a 
random CSP. In particular, they exhibit a rule for choosing one of two random clauses that 
delays unsatisfiability for random 2-SAT by a constant factor. They also consider 'off-line' rules 
for random fe-SAT and on-line rules for fc-SAT with k = w(logn). 

In this work we study the Achlioptas-process version of random fc-SAT (see Section [2] for a 
formal definition), and show in Section [3] that in fact the satisfiability threshold can be shifted, 
for any k, and in particular for the computationally interesting cases k > 3. Studying this 
semi-random model of &-SAT is a step towards understanding the standard model better, as 
has been done in the case of random graphs, but we also argue that this model is particularly 
relevant for A;-SAT because of the computational aspect. 

We aim to address the question of the hardness of the random fc-SAT decision problem. To 
avoid the triviality of almost sure satisfiability below the threshold and almost sure unsatisfiabil- 
ity above the threshold, we use the semi-random model of A:-SAT described above and propose 
an accompanying decision problem in Section 01 

To sum up, this paper makes four main contributions: 
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1. We introduce a semi-random model of fc-SAT and show that with respect to the satisfia- 
bility threshold the model is not trivial. 

2. We propose a gap decision problem for this semi-random model that may be more ap- 
proachable than random fc-SAT from the perspective of computational complexity. 

3. We prove that a specific Achlioptas rule can shift the satisfiability of random fc-SAT for 
all k > 2. This was previously known only for the case of 2-SAT. 

4. The rule we choose and our method of proof also shows that biased random /c-SAT formulas 
are easier to satisfy than unbiased ones. 

Notation 

The set of binary variables on which our random formulae are built is {xi, . . . x n }, and all 
asymptotics are as n — > oo. A literal is a variable X{ or its negation Xi, and we will denote literals 
with the letter w. A fc-clause is a disjunction of k literals, (w^ V Wi 2 V • • • V Wi k ). A formula of 
m clauses is the conjunction of m /c-clauses and is satisfiable if there exists an assignment to the 
n variables that satisfies each of the m clauses. We will write <l> m for a formula of m clauses. 
We write that an event E holds with high probability or whp if Pr[i±7] — > 1 as n — > oo. 

2 Semi-Random /c-SAT: The Model 

Here we define an /-clause Achlioptas /c-SAT process analogously to an Achlioptas random graph 
process: 

1. Begin at step with an empty formula, <l>o = 0. 

2. Each each step, I clauses are selected uniformly at random, with replacement, from the 
(™)2 fc possible fc-CNF clauses. 

3. According to a fixed rule R, exactly one of the I clauses is chosen and added to the current 
formula. <£>i = A fa, where <3?i_i is the current formula, and fa is the clause chosen at 
step i. 

Note that different rules R lead to a different processes (and different distributions over formulas 
<l> m at step m). The rule 'Always select the first clause' leads to the classic random fc-SAT 
distribution. The rule R can be a function of the / clauses presented, the current formula 
and the entire history of presented clauses up to step i. The rule can also use randomness. The 
rule, however, cannot be a function of the clauses presented in subsequent steps (such rules, 
while not standard Achlioptas processes, are called 'off-line' rules and have been studied for 
both the random graphs [11] and A:-SAT formulae [38]). The rules we analyze below will be 
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simpler: they will be functions of only the current formula and the / clauses presented in the 
current round. 

Semi-random models of NP-hard CSP's have been proposed and studied before. Blum and 
Spencer [8] describe a semi-random model for graph coloring in which an adversary's edge choices 
are reversed at random with some probability, and they give algorithms for coloring in a range 
of the parameters. Feige and Krauthgamer [21] give an algorithm for finding planted cliques in a 
semi-random graph in which an adversary is allowed to remove arbitrary edges from among the 
non-planted edges. In [10J Bohman et al. consider smoothed graphs, with random edges added 
to arbitrary graphs, and they determine how many random edges must be added for various 
monotone graph properties to hold whp. Krivelevich, Sudakov and Tetali [34J consider Ramsey 
properties in this model and study an analogous smoothed model of fc-SAT formulae. 

3 Results 

Our first result is that the satisfiability threshold for random /c-SAT can be delayed with an 
/-clause rule, for constant I. 

Theorem 1. For every integer k > 2, there exists an integer I and an l-clause Achlioptas rule 
for random k-SAT so that with probability 1 — o(l), the formula generated after 2 k+1 In 2 • n steps 
is satisfiable. 

In particular, since the threshold for random fe-SAT is at most 2 fc ln2, we show that with 
bounded choice, unsatisfiability can be delayed by a constant factor. 

Next we specialize to the case k = 3. 

Theorem 2. There exists a 5-clause Achlioptas rule for random 3-SAT so that with probability 
1 — o(l), the formula generated after 5.065n steps is satisfiable. 

In particular, 5.065 is above the best upper bound for the random 3-SAT threshold, and so 
we have in fact shifted the threshold. 

For the case k = 2, we improve the constant factor of delay for a 2-clause rule in the results 
of [38] with a different rule and a different proof. 

Theorem 3. There is a 2-clause Achlioptas rule for random 2-SAT that generates a formula 
that, after 1.055n steps, is satisfiable whp. 

The proofs will follow in Section [5j 

4 Semi- Random Gap /c-SAT 

Once we know that an Achlioptas rule can change the satisfiability threshold, the following 
decision problem becomes meaningful. 
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Let $j, i = 1, . . . be a growing semi-random 3-SAT formula generated according to a 2-clause 
Achlioptas rule R. We want to distinguish between the following two cases: 

• NO: At step 4n, the formula is unsatisfiable. 

• YES: At step 5n, the formula is satisfiable. 

If neither condition holds, then either answer is accepted. 

Question 1. Is there an efficient algorithm that for all rules R gives an acceptable answer to 
the above problem with probability 1 — o(l) ? 

We can generalize the above problem to A;— SAT and by varying three parameters: the number 
of random clauses presented at each step (I), the lower threshold (ci), and the upper threshold 
(02). We have chosen k = 3, 1 = 2, c\ = 4, C2 = 5 for simplicity. The problem becomes harder as 
I increases, and easier as either C2 increases or c\ decreases. If the adversary had no choice, and 
a random clause was added at each step, the problem would be easy: the satisfiability threshold 
would occur at r&(n), and if rt(n) < ci, NO would be acceptable whp; if rfc(n) > C2 YES would 
be acceptable; and otherwise either answer would be acceptable. But because we show that the 
adversary can in fact shift the threshold, the problem is no longer trivial. 

One way to interpret Achlioptas' original question is whether, under a specific model of 
semi-random graphs, the phase transition occurs when the average degree of a vertex hits 1 as 
it does in the Erdos-Renyi random graph. Bohman and Frieze answered 'no' to this question, 
and in some sense this shows that average degree 1 in the standard model is an artifact of the 
independence and uniformity of the random edges. However, the study of Achlioptas processes 
has identified a different statistic, rather than the average degree, that does control the phase 
transition for a large subclass of Achlioptas processes. This is the susceptibility, or the average 
component size in the graph: S(G) = n /2 v \C(v)\. Bohman and Kravitz |12| and Spencer 
and Wormald [39] show that for the class of 'bounded-size' rules, the blow-up point of an ODE 
tracking the growth of S(G) marks the critical point for the phase transition. 

The susceptibility allows one to understand where and why the phase transition occurs in 
Achlioptas random graph processes but is not needed algorithmically, since detecting a giant 
component is already an easy computational problem. In the case of fe-SAT however, if there 
was such a statistic, correlated with the threshold, which was efficiently computable, then the 
decision problem version of random A;-SAT would be tractable for a non-trivial reason. 

5 Proofs 

Theorem [TJ is corollary of the following lemma: 

Lemma 1. For fixed integers k > 2 and I > 2, there exists an l-clause Achlioptas rule for random 
k-SAT which creates a formula that, for every e > 0, is satisfiable whp after (r(k, I) — e)n steps, 
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where 

2^/2 

r(M) = 4(fe + l)('-D/2 (1) 

In particular, for every k there is an I large enough so that there exists an l-clause rule that 
delays unsatisfiability. 

We start by giving an idea of the strategy we will use to delay unsatisfiability. Consider a 
biased a random 3-SAT formula generated by adding m = rn clauses, with each clause selected 
as follows: with probability p, choose a clause uniformly from all 3-clauses with 3 positive literals, 
and with probability 1 — p choose a clause uniformly from all 3-clauses. Let & rn denote that 
random formula. This formula is biased in favor of assignments with more +l's than — l's. Let 
Zp be the number of satisfying assignments to <!> rn with (3n +l's and (1 — /3)n —l's, and let xp 
be the particular assignment that assigns the first fin variables +1 and the rest —1. Then 

1 , 1 ( ( n \ r 

— logEZ/3 = — log I I I Pr[xg satisfying 



n n \\Pri; 

= H(f3) + r logPrfa^ satisfies 3>i] + o(l) 

where H(/3) is the binary entropy function. 

Pr[x^ satisfies $i] ~ p(l - (1 - /3) 3 ) + (1 - p) ■ 7/8 

If we pick r small enough so that max^mi) + rlog[p(l — (1 — /3) 3 ) + (1 — p) ■ 7/8] > 1, then 
there will be exponentially many satisfying assignments in expectation. One can show that for 
any p > 0, there is some such r larger than log 2/ log(7/8) ~ 5.19... which is the simple first- 
moment upper bound for random 3-SAT. And in fact for any r > 0, there is a p £ (0, 1) so that 
max£ g ( 0i i) H(/3) + r log[p(l — (1 — /3) 3 ) + (1 — p) ■ 7/8] > 1, which shows that with enough bias, 
the first-moment bound can be pushed arbitrarily high. This complements results on random 
regular /c-SAT |15j . in which an extreme lack of bias, with every literal having the same degree, 
leads to an earlier threshold. 

Having exponentially many solutions in expectation does not imply a single solution with 
significant probability, so to prove the lemma we will use a related but different rule. This rule 
will select one of the I clauses presented at each step as follows: 

• If one of the first I — 1 clauses contains at least two positive literals, add it. (If there is 
more than one such clause, add the first.) 

• Otherwise add clause I. 



The effect of this rule is to bias the formula in favor of majority +1 assignments as above. 
Let r = r(k, I) — e. To prove that this rule produces a satisfiable formula whp, we will begin by 
taking the formula at step rn and converting it into a 2-SAT formula. For each clause with two 
or more positive literals, we keep a 2-CNF clause with only the first two positive literals; for each 
clause with exactly one positive literal, we keep a 2-clause with the positive literal and the first 
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negative literal; and for each clause with all negative literals, we keep a 2-clause with the first 
two literals. This gives a 2-SAT formula with rn clauses, call it ^(rn). If F^irn) is satisfiable, 
the original formula is also satisfiable since each 2-clause is a sub-clause of the corresponding 
fc-clause. Each clause is also added independently, and the probability it has two, one or zero 
positive literals is, respectively: 



P2 = 1 



Pi 



k + l\ l k 



k + l\ l 1 



2 k J k + 1 
P0= {-2K J k + 1 

Also, each clause with % positive literals is distributed uniformly from all 2-clauses with % positive 
literals. As is standard in studying random graphs and random CSP's, we will consider the model 
of random 2-CNF formulas in which each of the Q) possible clauses with two positive literals 
is present in a random formula Fi{r,n) independently with probability q 2 , each of the n(n — 1) 
clauses with one positive and one negative literal is present with probability q\, and each of the 
(2) with two negative literals is present independently with probability go- If we set 

2p 2 r 

qi = 

n 

qi = — 

n 

2p r 
Qo = 



then proving satisfiability whp of F 2 (r, Ti) implies satisfiability whp of F 2 (rn) (see [13], Appendix 
A, for details of the equivalent behavior of the two models). 

To study the satisfiability of F2(r,n) we will use an approach of [18J and [20], follwing [5], 
where it is noted that if there is no bicycle in a formula's 'implication graph', the formula is 
satisfiable. The vertices of the implication graph are the 2n literals, and for each clause (wi\/Wj) 
in the formula, we add two directed edges, (W{ — > Wj) and (Wj — > Wi) to the graph. A bicycle of 
length A: is a sequence of k literals of distinct variables, wi, w 2 , ■ ■ ■ Wk where the k — 1 directed 
edges (wi — > W2), (w 2 — >• ^3), ■ ■ ■ (wfc-i - > Wk) are present in the implication graph, as well as 
two additional directed edges (u — > W\) and (w^ — > v) for u,v G {w±, . . . Wk,wi, . . - Wk}. 

To show that there are no bicycles, we proceed as in [20], and consider first directed paths of 
length > L = Ke~ x logn. Note that a clause with two positive literals adds two directed edges 
from negative literals to positive literals; a clause with one negative and one positive literal adds 
one directed edge from a positive to a negative literal and one from a negative to a positive; 
and a clause with two negative literals adds two directed edge from positive literals to negative 
literals. So in a path of k literals, if the signs of the literals switch along the path i times there 
must be k — 1 — i corresponding clauses with exactly one positive literal, and i corresponding 
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clauses with two or zero positive literals each, at least (i — l)/2 of which must have positive 
literals. The probability that there is a directed path connecting a given sequence of L literals 
with i sign changes is therefor 

— Ii I2 % 

where we have used the fact that qo < q 2 - So the expected number of directed paths of length 
L is bounded above by 

n^L^^-^^l 1 " 12 (2) 
< n 2 r L ~ 2 L max (^Xt^i^Po)^ 2 (3) 

Now we consider bicycles of length < L. The expected number of bicycles of length at most 
L is bounded above: 

EY < £ n k ( g2 + qi + % fk 2 £ ( k ) gt 1 -^ 2 ^ (4) 

k=2 i=0 

< £ n%2 + + q ) 2 k" max ( k ) qt^q^qt^ (5) 

£ — ' i<k— 1 \ l I 



k=2 

L 



<-Y Ck 3 ^- 1 max I k \p k r l ~ l ^P2PQ) (l ~ 1)/2 (6) 
n f^ 2 i<fc-l Vv 

Considering ([3]) and ([6]), we can make the expected number of paths of length L and bicycles 
of length at most L both o(l) by choosing the constant in L large enough and by choosing r so 
that for some 5 > and k large: 

r k max ( k ) p^i^Pof 2 < (1 - 6) k (7) 

i<k \l J 



As a rough bound, it suffices to have 

r < -L and 
2pi 

1 

r < 

^VPo 

which translates to 



2^/2 

r < 4(/fc + l)('-i)/ 2 (8) 

With this choice of r, the expected number of bicycles in the implication graph is o(l) and 
therefor the formula is satisfiable whp. 
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To prove Theorem [2j we compute the p^s and consider ([7]) more closely. For k = 3, 1 = 5, 



P2 

Pi 
Po 



And thus from (JTj) we need for k large, 

(1 - 5) h > r fc max 



k 

i<k \ i 



31 

32 
3 

128 
1 

128 



(- 

V 128 



, i max 

\ 128 / i<k \l 



2V124 



fc\ / 2^/124 V 



3 y 



and so we need 



128 

r < min exp 

3 oe[o,i] 



-H(a) -alog(2\/l24/3) 



A numerical calculation shows that taking r = 5.065 is enough. 
For Theorem El the piS for the 2-clause, 2-SAT rule are: 



P2 



Pi 



Po 



16 

6 

16 

3 

16 



And thus, similarly to the above, we need 



r < - min exp [H(a) + a/21og(3/7)l 

3 ae[o,i] 



which gives r = 1.055. 



6 Discussion and Open Problems 

We conclude with some remarks and open problems. 

A first question is whether there is a 2-clause rule to shift the fc-SAT threshold for k > 2. It 
is natural to conjecture that in fact the rule used in the proof of Theorem [1] does in fact shift the 
threshold for I = 2, since it does shift the first-moment upper bound. The difficulty lies in the 
gap between the current upper and lower bounds on r&(n) - to prove that the rule has shifted 
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the threshold for 3-SAT, we need to prove that it has shifted the threshold all the way past 
4.508. What would be much more straightforward to prove would be that certain algorithms 
(the unit-clause algorithm and its variants described in [2]) succeed at higher densities for this 
rule than for random A;- SAT. 

We have discussed bounds on the satisfiability threshold of Achlioptas processes here and 
mentioned Friedgut's result on the sharpness of the fc-SAT threshold. In fact the rule we analyze 
can be shown to have a sharp threshold using Bourgain's sharp threshold criterion (Bourgain's 
appendix to [27]). It would be interesting to determine which Achlioptas fc-SAT processes have 
a sharp threshold or if all rules for a fixed I have a sharp threshold. 

Question 2. For fixed I and k, is there an I -clause Achlioptas rule for k-SAT that does not have 
a sharp threshold? 

Next we note that the rules we analyze above all operate by biasing the formula to favor a 
particular assignment. 

Question 3. Can the k-SAT threshold be shifted by an Achlioptas rule that is symmetric with 
respect to assignments? 

One candidate for such a rule would be the following: 

• If all (or none, for the opposite effect) of the literals in the first clause appear in the current 
formula, add it. 

• Otherwise add the second clause. 

And finally, analogies to work done on the phase transition of Achlioptas random graphs 
suggest many avenues for future work. 
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